---
layout: layout.njk
permalink: "{{ page.filePathStem }}.html"
title: Smile - Vector Quantization
---
{% include "toc.njk" %}

<div class="col-md-9 col-md-pull-3">
    <h1 id="vector-quantization-top" class="title">Vector Quantization</h1>

    <p>Originally used for data compression, Vector quantization (VQ)
        allows the modeling of probability density functions by
        the distribution of prototype vectors. It works by dividing a large set of points
        (vectors) into groups having approximately the same number of
        points closest to them. Each group is represented by its centroid
        point, as in K-Means and some other clustering algorithms.
        Because of this reason, the algorithm discussed in this section are
        in the same package of clustering algorithms.</p>

    <p>Vector quantization is based on the competitive learning paradigm,
        and also closely related to sparse coding models
        used in deep learning algorithms such as autoencoder.</p>

    <h2 id="som">Self-Organizing Map</h2>

    <p>A Self-Organizing Map (SOM) is an unsupervised learning method to produce
        a low-dimensional (typically two-dimensional) discretized representation
        (called a map) of the input space of the training samples. The model was
        first described as an artificial neural network by Teuvo Kohonen, and is
        sometimes called a Kohonen map.</p>

    <p>While it is typical to consider SOMs as related to feed-forward networks where
        the nodes are visualized as being attached, this type of architecture is
        fundamentally different in arrangement and motivation because SOMs use a
        neighborhood function to preserve the topological properties of the input
        space. This makes SOMs useful for visualizing low-dimensional views of
        high-dimensional data, akin to multi-dimensional scaling.</p>

    <p>An SOM consists of components called nodes or neurons.
        Associated with each node is a weight vector of the same dimension as
        the input data vectors and a position in the map space. The usual arrangement
        of nodes is a regular spacing in a hexagonal or rectangular grid. During the (iterative) learning,
        the input vectors are compared to the weight vector of each neuron. Neurons
        who most closely match the input are known as the best match unit (BMU) of
        the system. The weight vector of the BMU and those of nearby neurons are
        adjusted to be closer to the input vector by a certain step size.</p>

    <p>There are two ways to interpret a SOM. Because in the training phase weights
        of the whole neighborhood are moved in the same direction, similar items
        tend to excite adjacent neurons. Therefore, SOM forms a semantic map where
        similar samples are mapped close together and dissimilar apart.
        The other way is to think of neuronal weights as pointers to the input space.
        They form a discrete approximation of the distribution of training samples.
        More neurons point to regions with high training sample concentration and
        fewer where the samples are scarce.</p>

    <p>SOM may be considered a nonlinear generalization of PCA.
        It has been shown, using both artificial and real
        geophysical data, that SOM has many advantages over the conventional feature
        extraction methods such as Empirical Orthogonal Functions (EOF) or PCA.</p>

    <p>It has been shown that while SOMs with a small number of nodes behave in a
        way that is similar to K-means. However, larger SOMs rearrange data
        in a way that is fundamentally topological in character and display properties
        which are emergent. Therefore, large maps are preferable to smaller ones.
        In maps consisting of thousands of nodes, it is possible to perform cluster
        operations on the map itself.</p>

    <p>A common way to display SOMs is the heat map of U-matrix. The U-matrix value
        of a particular node is the minimum/maximum/average distance between the node
        and its closest neighbors. In a rectangular grid for instance, we might
        consider the closest 4 or 8 nodes. It is common to use the U-Matrix to
        visualize an SOM. The U-Matrix value of a particular node is the average
        distance between the node's weight vector and that of its closest neighbors.
        In a square grid, for instance, we might consider the closest 4 or 8 nodes
        (the Von Neumann and Moore neighborhoods, respectively), or six nodes in
        a hexagonal grid.</p>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/t4.8k.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">Chameleon t4.8k</div>
    </div>

    <p>In what follows, we apply SOM to a complicated data from Chameleon
        data set, as shown in the above. The SOM has <code>20 x 20</code>
        neurons.</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_1" data-toggle="tab">Java</a></li>
        <li><a href="#scala_1" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_1">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val x = read.csv("data/clustering/chameleon/t4.8k.txt", header=false, delimiter=" ").toArray

    val epochs = 20
    val lattice = SOM.lattice(20, 20, x)
    val som = new SOM(lattice,
                TimeFunction.constant(0.1),
                Neighborhood.Gaussian(1, x.length * epochs / 4))

    (1 to epochs).foreach { _ =>
        MathEx.permutate(x.length).foreach { i =>
            som.update(x(i));
        }
    }

    show(hexmap(som.umatrix, Palette.jet(256)))
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_1">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code style="white-space: preserve nowrap;">
    import smile.util.*;
    import smile.vq.*;
    var x = Read.csv("data/clustering/chameleon/t4.8k.txt", CSVFormat.DEFAULT.withDelimiter(' ')).toArray();

    var epochs = 20;
    var lattice = SOM.lattice(20, 20, x);
    var som = new SOM(lattice,
                TimeFunction.constant(0.1),
                Neighborhood.Gaussian(1, x.length * epochs / 4));

    for (int e = 0; e < epochs; e++) {
        for (int i : MathEx.permutate(x.length)) {
            som.update(x[i]);
        }
    }

    Hexmap.of(som.umatrix(), Palette.jet(256)).canvas().window();
          </code></pre>
            </div>
        </div>
    </div>

    <p>The U-Matrix is visualized with a hexmap, where a dark red coloring between
        the neurons corresponds to a large distance and thus a gap between
        the codebook values in the input space. A dark blue coloring between
        the neurons signifies that the codebook vectors are close to each other
        in the input space. The blueish areas can be thought as clusters and
        reddish areas as cluster separators. This can be a helpful presentation
        when one tries to find clusters in the input data without having any
        <i>a priori</i> information about the clusters.</p>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/som-umatrix-hexmap.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">SOM U-Matrix</div>
    </div>

    <p>One may also fit a Gaussian mixture model on the U-Matrix values.
        Our implementation of EM algorithm to fix mixture models can automatically
        determine the number of components in the mixture. The distribution
        information can be used in other algorithms such as Guassian kernel smooth
        parameters, nearest neighbor range, etc.</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_2" data-toggle="tab">Java</a></li>
        <li><a href="#scala_2" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_2">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val dist = som.umatrix.flatten
    val canvas = hist(dist, 100, true)

    val mixture = GaussianMixture.fit(dist)

    val minDist = MathEx.min(dist)
    val w = (MathEx.max(dist) - minDist) / 50
    val pdf = (0 to 50).map { i =>
      val x = minDist + i * w
      val y = mixture.p(x) * w
      Array(x, y)
    }.toArray

    canvas.add(LinePlot.of(pdf, Color.RED))
    show(canvas)
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_2">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code>
    var dist = Arrays.stream(som.umatrix()).flatMapToDouble(r -> Arrays.stream(r)).toArray();
    var canvas = smile.plot.swing.Histogram.of(dist, 100, true).canvas();

    var mixture = GaussianMixture.fit(dist);

    var minDist = MathEx.min(dist);
    var w = (MathEx.max(dist) - minDist) / 50;
    var pdf = new double[51][2];
    for (int i = 0; i < pdf.length; i++) {
      pdf[i][0] = minDist + i * w;
      pdf[i][1] = mixture.p(pdf[i][0]) * w;
    }

    canvas.add(LinePlot.of(pdf, Color.RED));
    canvas.window();
          </code></pre>
            </div>
        </div>
    </div>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/som-umatrix-hist.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">SOM U-Matrix Value Distribution</div>
    </div>

    <p>It is also common to visualize an SOM by mapping the neuron
        codebook vectors to a lower dimensional space (e.g. by Sammon's mapping).</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_3" data-toggle="tab">Java</a></li>
        <li><a href="#scala_3" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_3">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val codebook = som.neurons.flatten
    val codebook2d = sammon(pdist(codebook, false).toArray(), 2).coordinates

    val nodes = 20
    val neurons = (0 until nodes).map { i => codebook2d.slice(nodes*i, nodes*(i+1)) }.toArray
    show(grid(neurons))
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_3">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code>
    var codebook = Arrays.stream(som.neurons()).flatMap(r -> Arrays.stream(r)).toArray(double[][]::new);
    var codebook2d = SammonMapping.of(MathEx.pdist(codebook, false).toArray(), 2).coordinates;

    var nodes = 20;
    var neurons = new double[nodes][nodes][];
    for (int i = 0, k = 0; i < nodes; i++) {
        for (int j = 0; j < nodes; j++, k++) {
            neurons[i][j] = codebook2d[k];
        }
    }
    Grid.of(neurons).canvas().window();
          </code></pre>
            </div>
        </div>
    </div>

    <p>Because this artificial data is 2-dimensional, it is not necessary
        to map the neurons in the SOM to a lower dimensional space.
        For demonstration, we still go through the whole process with a Sammon's mapping.</p>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/som-sammon.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">SOM Neuron Grid after Sammon's Mapping</div>
    </div>

    <h2 id="neural-gas">Neural Gas</h2>

    <p>The Neural Gas algorithm is inspired
        by the SOM for finding optimal data representations based on
        feature vectors. The algorithm was coined "Neural Gas" because of the
        dynamics of the feature vectors during the adaptation process, which
        distribute themselves like a gas within the data space. Although it is mainly
        applied where data compression or vector quantization is an issue,
        it is also used for cluster analysis as a robustly converging alternative to
        the k-means clustering.</p>

    <p>Compared to SOM, neural gas has no topology of a fixed dimensionality
        (in fact, no topology at all). For each input signal during learning, the
        neural gas algorithm sorts the neurons of the network according to the
        distance of their reference vectors to the input signal. Based on this
        "rank order", neurons are adapted based on the adaptation strength that are
       decreased according to a fixed schedule.</p>

    <p>The adaptation step of the Neural Gas can be interpreted as gradient descent
        on a cost function. By adapting not only the closest feature vector but all
        of them with a step size decreasing with increasing distance order,
        compared to k-means clustering, a much more robust convergence of the
        algorithm can be achieved.</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_4" data-toggle="tab">Java</a></li>
        <li><a href="#scala_4" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_4">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val epochs = 20
    val gas = new NeuralGas(NeuralGas.seed(400, x),
                TimeFunction.exp(0.3, x.length * epochs / 2),
                TimeFunction.exp(30, x.length * epochs / 8),
                TimeFunction.constant(x.length * 2))

    (1 to epochs).foreach { _ =>
        MathEx.permutate(x.length).foreach { i =>
            gas.update(x(i))
        }
    }

    val canvas = plot(x, '.')
    canvas.add(ScatterPlot.of(gas.neurons, '#', BLUE))
    show(canvas)
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_4">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code>
    var epochs = 20;
    var gas = new NeuralGas(NeuralGas.seed(400, x),
                TimeFunction.exp(0.3, x.length * epochs / 2),
                TimeFunction.exp(30, x.length * epochs / 8),
                TimeFunction.constant(x.length * 2));

    for (int e = 0; e < epochs; e++) {
        for (int i : MathEx.permutate(x.length)) {
            gas.update(x[i]);
        }
    }

    var canvas = ScatterPlot.of(x, '.').canvas();
    canvas.add(ScatterPlot.of(gas.neurons(), '#', Color.BLUE));
    canvas.window();
          </code></pre>
            </div>
        </div>
    </div>

    <p>In the plot, we draw the neurons as big blue dots.</p>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/neural-gas.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">Neural Gas</div>
    </div>

    <h2 id="growing-neural-gas">Growing Neural Gas</h2>

    <p>A prominent extension is the Growing Neural Gas (GNG).
        GNG can add and delete nodes during algorithm execution.
        The growth mechanism is based on growing cell structures
        and competitive Hebbian learning.</p>

    <p>Compared to Neural Gas, GNG has the following distinctions:</p>
    <ul>
        <li>The system has the ability to add and delete nodes.</li>

        <li>Local Error measurements are noted at each step helping it to locally
            insert/delete nodes.</li>

        <li>Edges are connected between nodes, so a sufficiently old edges is
            deleted. Such edges are intended placeholders for localized
            data distribution.</li>

        <li>Such edges also help to locate distinct clusters (those clusters are
            not connected by edges).</li>
    </ul>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_5" data-toggle="tab">Java</a></li>
        <li><a href="#scala_5" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_5">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val gng = new GrowingNeuralGas(x(0).length)

    (1 to 10).foreach { _ =>
        MathEx.permutate(x.length).foreach { i =>
            gng.update(x(i))
        }
    }

    val neurons = gng.neurons
    val canvas = plot(x, '.')
    canvas.add(ScatterPlot.of(neurons.map(_.w).toArray, '@', BLUE))

    import scala.jdk.CollectionConverters._
    neurons.foreach { neuron =>
      neuron.edges.asScala.foreach { edge =>
        canvas.add(LinePlot.of(Array(neuron.w, edge.neighbor.w), BLUE))
      }
    }
    show(canvas)
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_5">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code style="white-space: preserve nowrap;">
    var gng = new GrowingNeuralGas(x[0].length);

    for (int e = 0; e < epochs; e++) {
        for (int i : MathEx.permutate(x.length)) {
            gng.update(x[i]);
        }
    }

    var neurons = gng.neurons();
    var canvas = ScatterPlot.of(x, '.').canvas();
    canvas.add(ScatterPlot.of(Arrays.stream(neurons).map(n -> n.w).toArray(double[][]::new), '@', Color.BLUE));

    Arrays.stream(neurons).forEach(neuron -> {
        neuron.edges.stream().forEach(edge -> {
            double[][] e = {neuron.w, edge.neighbor.w};
            canvas.add(LinePlot.of(e, Color.BLUE));
        });
    });
    canvas.window();
          </code></pre>
            </div>
        </div>
    </div>

    <p>As shown in the plot, GNG nicely capture the structure of data.
        However, there are also neurons fitting noises and connecting
        clusters that should be separated. A further graph cut/clustering
        may help removing these neurons of noise.</p>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/growing-neural-gas.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">Growing Neural Gas</div>
    </div>

    <h2 id="neural-map">Neural Map</h2>

    <p>Neural Map is an efficient competitive learning algorithm inspired by growing
        neural gas and BIRCH. Like growing neural gas, Neural Map has the ability to
        add and delete neurons with competitive Hebbian learning. Edges exist between
        neurons close to each other. Such edges are intended placeholders for
        localized data distribution. Such edges also help to locate distinct clusters
        (those clusters are not connected by edges). Neural Map employs Locality-Sensitive
        Hashing to speed up the learning while BIRCH uses balanced CF trees.</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_6" data-toggle="tab">Java</a></li>
        <li><a href="#scala_6" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_6">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val cortex = new NeuralMap(10, 0.01, 0.002, 50, 0.995)

    (1 to 5).foreach { _ =>
        MathEx.permutate(x.length).foreach { i =>
            cortex.update(x(i))
        }

        // Removes staled neurons and the edges beyond lifetime.
        cortex.clear(1E-7)
    }

    val neurons = cortex.neurons
    val canvas = plot(x, '.')
    canvas.add(ScatterPlot.of(neurons.map(_.w).toArray, '@', BLUE))

    import scala.jdk.CollectionConverters._
    neurons.foreach { neuron =>
      neuron.edges.asScala.foreach { edge =>
        canvas.add(LinePlot.of(Array(neuron.w, edge.neighbor.w), BLUE))
      }
    }
    show(canvas)
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_6">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code style="white-space: preserve nowrap;">
    var cortex = new NeuralMap(10, 0.01, 0.002, 50, 0.995);

    for (int e = 0; e < 5; e++) {
        for (int i : MathEx.permutate(x.length)) {
            cortex.update(x[i]);
        }

        // Removes staled neurons and the edges beyond lifetime.
        cortex.clear(1E-7);
    }

    var neurons = cortex.neurons();
    var canvas = ScatterPlot.of(x, '.').canvas();
    canvas.add(ScatterPlot.of(Arrays.stream(neurons).map(n -> n.w).toArray(double[][]::new), '@', Color.BLUE));

    Arrays.stream(neurons).forEach(neuron -> {
        neuron.edges.stream().forEach(edge -> {
            double[][] e = {neuron.w, edge.neighbor.w};
            canvas.add(LinePlot.of(e, Color.BLUE));
        });
    });
    canvas.window();
          </code></pre>
            </div>
        </div>
    </div>

    <p>Note that points in orange are outliers labeled by the algorithm.</p>
    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/neural-map.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">Neural Map</div>
    </div>

    <h2 id="birch">BIRCH</h2>

    <p>BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) performs
        hierarchical clustering over particularly large datasets. An advantage of
        BIRCH is its ability to incrementally and dynamically cluster incoming,
        multidimensional metric data points in an attempt to produce the high
        quality clustering for a given set of resources (memory and time constraints).</p>

    <p>BIRCH has several advantages. For example, each clustering decision is made
        without scanning all data points and currently existing clusters. It
        exploits the observation that data space is not usually uniformly occupied
        and not every data point is equally important. It makes full use of
        available memory to derive the finest possible sub-clusters while minimizing
        I/O costs. It is also an incremental method that does not require the whole
        data set in advance.</p>

    <p>This implementation produces a clustering in three steps. First step
        builds a CF (clustering feature) tree by a single scan of database.
        The second step clusters the leaves of CF tree by hierarchical clustering.
        Then the user can use the learned model to cluster input data in the final
        step. In total, we scan the database twice.</p>

    <p>A CF leaf will be treated as outlier if the number of its
        points is less than the parameter <code>minPts</code>.
        The branching factor parameter <code>branch</code> is the
        maximum number of children nodes and the parameter <code>radius</code>
        is the maximum radius of a sub-cluster.</p>

    <ul class="nav nav-tabs">
        <li class="active"><a href="#java_7" data-toggle="tab">Java</a></li>
        <li><a href="#scala_7" data-toggle="tab">Scala</a></li>
    </ul>
    <div class="tab-content">
        <div class="tab-pane" id="scala_7">
            <div class="code" style="text-align: left;">
    <pre class="prettyprint lang-scala"><code>
    val birch = new BIRCH(x(0).length, 6, 3, 10)
    x.foreach { xi =>
        birch.update(xi)
    }

    val canvas = plot(x, '.')
    canvas.add(ScatterPlot.of(birch.centroids, '#', BLUE))
    show(canvas)
    </code></pre>
            </div>
        </div>
        <div class="tab-pane active" id="java_7">
            <div class="code" style="text-align: left;">
          <pre class="prettyprint lang-java"><code>
    var birch = new BIRCH(x[0].length, 6, 3, 10);
    for (double[] xi : x) {
        birch.update(xi);
    }

    var canvas = ScatterPlot.of(x, '.').canvas();
    canvas.add(ScatterPlot.of(birch.centroids(), '#', Color.BLUE));
    canvas.window();
          </code></pre>
            </div>
        </div>
    </div>

    <div style="width: 100%; display: inline-block; text-align: center;">
        <img src="images/birch.png" class="enlarge" style="width: 480px;" />
        <div class="caption" style="min-width: 480px;">BIRCH</div>
    </div>

    <div id="btnv">
        <span class="btn-arrow-left">&larr; &nbsp;</span>
        <a class="btn-prev-text" href="clustering.html" title="Previous Section: Clustering"><span>Clustering</span></a>
        <a class="btn-next-text" href="association-rule.html" title="Next Section: Association Rule Mining"><span>Association Rule Mining</span></a>
        <span class="btn-arrow-right">&nbsp;&rarr;</span>
    </div>
</div>

<script type="text/javascript">
    $('#toc').toc({exclude: 'h1, h5, h6', context: '', autoId: true, numerate: false});
</script>
